Registered Nurses in the United States and Territories

tidytuesday

Understanding wages for Registered Nurses.

Ted Laderas
10/5/2021

Research Question(s)

  1. Which states have the highest overall wages for registered nurses? When did this happen?
  2. Have wages increased overall for registered nurses across all states?

Loading Data

We’ll use the Tidy Tuesday code to directly load the data from the GitHub repository. We’ll also pass it into janitor::clean_names() to standardize the column names. (Life is too short to have to worry about whitespace and capitalization.)

nurses <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-05/nurses.csv') %>% janitor::clean_names()

Initial EDA

We can see there are 22 columns overall. 21 of these are numeric.

skimr::skim(nurses)
Table 1: Data summary
Name nurses
Number of rows 1242
Number of columns 22
_______________________
Column type frequency:
character 1
numeric 21
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
state 0 1 4 20 0 54 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1.00 2009.00 6.64 1998.00 2.00300e+03 2009.00 2015.00 2020.00 ▇▆▇▆▇
total_employed_rn 5 1.00 47703.88 50241.05 240.00 1.22100e+04 31160.00 60230.00 307060.00 ▇▂▁▁▁
employed_standard_error_percent 5 1.00 4.36 3.04 0.70 2.50000e+00 3.50 5.10 26.10 ▇▂▁▁▁
hourly_wage_avg 6 1.00 28.48 6.65 9.23 2.37000e+01 28.25 32.39 57.96 ▁▇▆▁▁
hourly_wage_median 6 1.00 27.86 6.72 8.64 2.30800e+01 27.58 31.72 56.93 ▁▇▇▁▁
annual_salary_avg 6 1.00 59248.30 13829.14 19190.00 4.93000e+04 58750.00 67377.50 120560.00 ▁▇▆▁▁
annual_salary_median 6 1.00 57957.92 13978.95 17970.00 4.79950e+04 57375.00 65987.50 118410.00 ▁▇▇▁▁
wage_salary_standard_error_percent 6 1.00 1.27 0.70 0.40 9.00000e-01 1.10 1.42 7.50 ▇▁▁▁▁
hourly_10th_percentile 6 1.00 20.23 4.66 6.38 1.68100e+01 20.04 23.54 36.62 ▁▆▇▃▁
hourly_25th_percentile 6 1.00 23.54 5.51 7.33 1.94700e+01 23.24 27.01 45.18 ▁▇▇▂▁
hourly_75th_percentile 6 1.00 32.92 8.07 10.04 2.72100e+01 32.61 37.33 71.07 ▁▇▅▁▁
hourly_90th_percentile 6 1.00 38.16 9.23 12.33 3.25100e+01 37.50 43.41 83.35 ▁▇▅▁▁
annual_10th_percentile 6 1.00 42087.70 9694.20 13260.00 3.49575e+04 41670.00 48955.00 76180.00 ▁▆▇▃▁
annual_25th_percentile 6 1.00 48968.81 11469.49 15260.00 4.04875e+04 48335.00 56195.00 93970.00 ▁▇▇▂▁
annual_75th_percentile 6 1.00 68464.53 16777.63 20890.00 5.65975e+04 67835.00 77637.50 147830.00 ▁▇▅▁▁
annual_90th_percentile 6 1.00 79367.01 19201.21 25650.00 6.76200e+04 78015.00 90290.00 173370.00 ▁▇▅▁▁
location_quotient 649 0.48 1.01 0.19 0.32 9.00000e-01 1.01 1.13 1.50 ▁▁▇▇▁
total_employed_national_aggregate 4 1.00 134075563.81 6133532.52 124143490.00 1.29059e+08 131713800.00 138885360.00 147838700.00 ▅▇▅▃▃
total_employed_healthcare_national_aggregate 4 1.00 7268640.12 943177.74 5854360.00 6.22654e+06 7250140.00 8076300.00 8727310.00 ▇▃▅▅▆
total_employed_healthcare_state_aggregate 2 1.00 134743.23 143540.40 110.00 3.34475e+04 87435.00 175292.50 844930.00 ▇▂▁▁▁
yearly_total_employed_state_aggregate 0 1.00 2387208.60 2774288.09 110.00 5.96520e+05 1557110.00 2888682.50 17382400.00 ▇▂▁▁▁
head(nurses)
# A tibble: 6 × 22
  state       year total_employed_rn employed_standar… hourly_wage_avg
  <chr>      <dbl>             <dbl>             <dbl>           <dbl>
1 Alabama     2020             48850               2.9            29.0
2 Alaska      2020              6240              13              45.8
3 Arizona     2020             55520               3.7            38.6
4 Arkansas    2020             25300               4.2            30.6
5 California  2020            307060               2              58.0
6 Colorado    2020             52330               2.8            37.4
# … with 17 more variables: hourly_wage_median <dbl>,
#   annual_salary_avg <dbl>, annual_salary_median <dbl>,
#   wage_salary_standard_error_percent <dbl>,
#   hourly_10th_percentile <dbl>, hourly_25th_percentile <dbl>,
#   hourly_75th_percentile <dbl>, hourly_90th_percentile <dbl>,
#   annual_10th_percentile <dbl>, annual_25th_percentile <dbl>,
#   annual_75th_percentile <dbl>, annual_90th_percentile <dbl>, …

Looking at how years are divided.

nurses %>%
  count(year)
# A tibble: 23 × 2
    year     n
   <dbl> <int>
 1  1998    54
 2  1999    54
 3  2000    54
 4  2001    54
 5  2002    54
 6  2003    54
 7  2004    54
 8  2005    54
 9  2006    54
10  2007    54
# … with 13 more rows

Hmmm. 54 entries per year. This includes: D.C., Virgin Islands, Puerto Rico, and Guam in addition to the 50 states.

nurses %>%
  count(state)
# A tibble: 54 × 2
   state                    n
   <chr>                <int>
 1 Alabama                 23
 2 Alaska                  23
 3 Arizona                 23
 4 Arkansas                23
 5 California              23
 6 Colorado                23
 7 Connecticut             23
 8 Delaware                23
 9 District of Columbia    23
10 Florida                 23
# … with 44 more rows

The mean total number of nurses overall states shows an upward trend, except for a blip in 2012-2013.

nurses %>%
  group_by(year) %>%
  summarize(mean_employed_rn = mean(total_employed_rn, na.rm=TRUE)) %>%
  ggplot() +
  aes(x=year, y=mean_employed_rn) %>%
  geom_line()

Let’s visualize whether hourly wages are increasing or decreasing across the dataset by making a heatmap. On the x-axis, we will visualize year, and we will visualize by state on our y-axis. We’re going to map the fill value to hourly_wage_median:

nurses %>%
  mutate(state=forcats::fct_rev(state)) %>%
  ggplot() +
  aes(x=year, y=state, fill=hourly_wage_median) +
  geom_tile()

Scaling the data by state

Looking for trends in the nurses data, let’s try and scale each income so we can emphasize whether there were increases or decreases within each state. We’re just looking for trends here and whether the slope of these trends is the same for each state.

Note that by scaling within a state (transforming each value to a z-score), we are losing information, but we can see whether wages are steadily increasing for each of the states/territories.

In general, with some exceptions (Guam and Virgin Islands), most registered nurses saw an increase in median hourly wages from 1998 to 2020.

nurses %>%
  mutate(state=forcats::fct_rev(state)) %>%
  group_by(state) %>%
  mutate(scaled_income = scale(hourly_wage_median)) %>%
  ggplot() +
  aes(x=year, y=state, fill=scaled_income) +
  geom_tile(color="grey10") +
  scale_fill_distiller() +
  bplots::theme_avenir()

Since we looked at median hourly income, the question is whether these trends are the same or different for the 10th and 90th percentiles of registered nurses.

10th Percentile

nurses %>%
  mutate(state=forcats::fct_rev(state)) %>%
  group_by(state) %>%
  mutate(scaled_income = scale(hourly_10th_percentile)) %>%
  ggplot() +
  aes(x=year, y=state, fill=scaled_income) +
  geom_tile(color="grey10") +
  scale_fill_distiller() +
  bplots::theme_avenir() +
  theme(axis.text.x=element_text(angle=90))

90th Percentile

For the most part, if you are in the 90th percentile of hourly wages, you have seen a leveling off of income after about 2008. After 2008, the 90th income seems pretty static and unchanging.

nurses %>%
  mutate(state=forcats::fct_rev(state)) %>%
  group_by(state) %>%
  mutate(scaled_income = scale(hourly_90th_percentile)) %>%
  ggplot() +
  aes(x=year, y=state, fill=scaled_income) +
  geom_tile(color="grey10") +
  scale_fill_distiller() +
  bplots::theme_avenir() +
  ggtitle("90 percentile RNs have slower increases in income than the 10%")

Making heatmaps with dendrograms

Pivoting the data to be wider

One question we might ask are whether there are groupings by states in terms of the wage increases.

We can do this by pivoting the data and using the {heatmaply} package to make a matrix input suitable for heatmaply::heatmaply().

Here, we take hourly_wage_median and use it in the values of our matrix. Our rows correspond to state and our columns correspond to year.

nurse_median_frame <- nurses %>%
  select(state, year, hourly_wage_median) %>%
  arrange(year) %>%
  tidyr::pivot_wider(names_from = year, values_from = hourly_wage_median) 

nurse_median_matrix <- nurse_median_frame[,-1]
rownames(nurse_median_matrix) <- nurse_median_frame[["state"]]
nurse_median_matrix <- as.matrix(nurse_median_matrix)

head(nurse_median_matrix)
            1998  1999  2000  2001  2002  2003  2004  2005  2006
Alabama    17.63 18.09 19.60 19.99 20.60 20.81 21.23 22.43 23.52
Alaska     22.37 23.02 24.90 26.13 26.45 26.47 28.69 28.54 30.41
Arizona    19.37 20.26 21.97 22.23 23.35 23.88 25.12 26.90 28.06
Arkansas   16.66 17.18 18.02 18.44 19.20 19.98 21.17 22.63 23.62
California 23.95 25.12 26.50 27.36 28.38 29.47 31.61 33.15 35.23
Colorado   19.79 20.47 21.77 22.56 23.17 23.88 25.60 26.91 28.15
            2007  2008  2009  2010  2011  2012  2013  2014  2015
Alabama    24.92 25.80 26.48 26.44 26.41 26.02 26.20 26.39 26.70
Alaska     33.48 34.42 35.33 37.39 38.67 38.73 40.08 41.12 42.37
Arizona    29.17 30.59 31.78 33.11 34.42 34.24 34.14 34.00 34.38
Arkansas   24.17 24.78 25.10 25.28 25.90 26.16 26.56 26.72 26.76
California 36.77 38.93 39.86 41.03 42.51 43.88 45.34 46.38 48.27
Colorado   29.69 30.76 31.74 31.81 32.35 32.22 32.73 32.83 32.95
            2016  2017  2018  2019  2020
Alabama    26.68 27.20 27.85 28.27 28.19
Alaska     41.01 41.45 42.14 43.54 45.23
Arizona    34.94 35.70 36.43 36.93 37.98
Arkansas   27.26 27.68 28.68 29.01 29.97
California 48.30 48.43 50.20 53.18 56.93
Colorado   33.05 34.27 35.03 36.10 36.78

Heatmap with No scaling

We can now ask questions about the actual income values. We make heatmaply only look at computing a dendrogram for the rows (states) to look for clustering patterns.

Note we have to set our scale argument to none here.

heatmaply(nurse_median_matrix, dendrogram = "row", 
          Colv = c(1:23), scale="none",
          main = "Oregon, California, and Hawaii have the highest median wage from 2017-2020")

Scaling by state

If we are interested in relative (scaled) values, the dendrogram is a little less interesting. Overall you can see that all states showed an increase in hourly median wage over the years.

heatmaply(nurse_median_matrix, dendrogram = "row", 
          Colv = c(1:23), scale="row", 
          main="Upward trends overall in terms of hourly median wage")

Conclusions

This was a nice dataset to get back into Tidy Tuesday.

Citation

For attribution, please cite this work as

Laderas (2021, Oct. 5). Edward Hillenaar, MSc, candidate PhD: Registered Nurses in the United States and Territories. Retrieved from https://laderast.github.io/articles/2021-10-05-registered-nurses/

BibTeX citation

@misc{laderas2021registered,
  author = {Laderas, Ted},
  title = {Edward Hillenaar, MSc, candidate PhD: Registered Nurses in the United States and Territories},
  url = {https://laderast.github.io/articles/2021-10-05-registered-nurses/},
  year = {2021}
}